An Hybrid Data Stream Summarizing Approach by Sampling and Clustering
نویسندگان
چکیده
Computer systems generate a large amount of data that, in terms of space and time, is very expensive even impossible to store. Besides this, many applications need to keep an historical view of such data in order to provide historical aggregated information, perform data mining tasks or detect anomalous behavior in the computer systems. One solution is to treat the data as streams that can be processed on the fly in order to build historical summaries. Many data summarizing techniques have already been developed such as sampling, clustering, histograms, etc. Some of them have been extended to be applied directly to data streams. This chapter presents a new approach to building such historical summaries of data streams. It is based on a combination of two existing algorithms: StreamSamp and CluStream. The combination takes advantages of the benefits of each algorithm and avoids their drawbacks. Some experiments are presented both on real and synthetic data. These experiments show that the new approach gives better results than using any one of the two mentioned algorithms.
منابع مشابه
A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملIntrusion Detection based on a Novel Hybrid Learning Approach
Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper ...
متن کاملA Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملGenerating Optimal Timetabling for Lecturers using Hybrid Fuzzy and Clustering Algorithms
UCTTP is a NP-hard problem, which must be performed for each semester frequently. The major technique in the presented approach would be analyzing data to resolve uncertainties of lecturers’ preferences and constraints within a department in order to obtain a ranking for each lecturer based on their requirements within a department where it is attempted to increase their satisfaction and develo...
متن کاملElectrofacies clustering and a hybrid intelligent based method for porosity and permeability prediction in the South Pars Gas Field, Persian Gulf
This paper proposes a two-step approach for characterizing the reservoir properties of the world’s largest non-associated gas reservoir. This approach integrates geological and petrophysical data and compares them with the field performance analysis to achieve a practical electrofacies clustering. Porosity and permeability prediction is done on the basis of linear functions, succeeding the elec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009